Dynamic k-NN with Attribute Weighting for Automatic Web Page Classification(Dk-NNwAW)
نویسندگان
چکیده
The Internet has been in a state of explosive expansion over the last decade and a half. The addition of numerous web pages to the World Wide Web by a vast array of authors on a plethora of topics leaves behind the problem of organizing these web pages in order to improve search results leading to more relevant information. In this paper, a modified attribute weighted dynamic k-Nearest Neighbor classification algorithm, using k-Means clustering, is proposed. This presents a solution to the automatic classification of Web Pages on the WWW, supported by the adaptive dynamic nature of the algorithm. Web pages are classified based on the class distribution of the pages in their neighborhood. Attribute weighting is used primarily to improve classification accuracy in cases of imbalanced class distribution. Empirical results observed show good classification accuracy, while at the same time, improving on other shortcomings of the traditional k-NN classification model.
منابع مشابه
Naïve Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages
Web classification has been attempted through many different technologies. In this study we concentrate on the comparison of Neural Networks (NN), Naïve Bayes (NB) and Decision Tree (DT) classifiers for the automatic analysis and classification of attribute data from training course web pages. We introduce an enhanced NB classifier and run the same data sample through the DT and NN classifiers ...
متن کاملClassification of Web Documents Using a Graph Model
In this paper we describe work relating to classification of web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the kNearest Neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The pr...
متن کاملWeb page feature selection and classification using neural networks
Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features. Each news web page is represented by the term-weighting scheme. As the number of unique words...
متن کاملVerbal intelligence identification based on text classification
This paper analyses and compares term weighting methods for automatic verbal intelligence identification from speech. Two different corpora are used; the first one contains monologues on the same topic; the second one contains dialogues between two or three people. The problem is described as a text classification task with two classes: low and high verbal intelligence. Seven different term wei...
متن کاملAn Improved Approach to Term Weighting in Hierarchical Web Page Classification
Currently, in web page classification, Absolute Weighting Method is a common method to weight HTML main structure features. The disadvantage of the method is that weighting coefficient is a fixed value, which has different effects on the long and short text. So the influence of structure features on local text will be weakened with the length of local text increasing. To solve the problem, we p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012